Are Aftershocks of Large Californian Earthquakes Diffusing? 
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Abstract 

We analyze 21 aftershock sequences of California to test for evidence of space-time diffusion. 
Aftershock diffusion may result from stress diffusion and is also predicted by any mechanism of 
stress weakening. Here, we test an alternative mechanism to explain aftershock diffusion, based 
on multiple cascades of triggering. In order to characterize aftershock diffusion, we develop two 
methods, one based on a suitable time and space windowing, the other using a wavelet transform 
adapted to the removal of background seismicity. Both methods confirm that diffusion of seismic 
activity is very weak, much weaker than reported in previous studies. A possible mechanism 
explaining the weakness of observed diffusion is the effect of geometry, including the localization 
of aftershocks on a fractal fault network and the impact of extended rupture lengths which 
control the typical distances of interaction between earthquakes. 



2 



1. Introduction 

Aftershocks are defined by their clustering proper- 
ties both in time and space. The temporal clustering 
obeys the well established (modified) Omori law [ Utsu 
et al, 1995] which states that the rate of aftershocks 
decays as 



where the Omori exponent p is generally found be- 
tween 0.5 and 2. In expression (1), the origin of time is 
the occurrence of the main event. The offset time c is 
often not well-constrained and may partly account for 
the incompleteness of catalogs close to mainshocks. 
It provides a regularization at short times prevent- 
ing the event rate to reach infinity at the time of the 
mainshock. This law with the cut-off c may be derived 
from a variety of physical mechanism (e.g. [Dieterich, 
1994]). 

The spatial organization of aftershocks is more 
complex and less understood. On the one-hand, it is 
recognized that many aftershocks occur right on the 
fault plane or in its immediate vicinity (at the scale of 
the mainshock rupture length) and this is used for 3D 
visualization of the rupture surface (see for instance 
Fukuyama [1991]). The observation that the spatial 
distribution of earthquakes in California is character- 
ized by a fractal dimension close to 2.2 [Kagan and 
Knopoff, 1980] and the fact that this measure is domi- 
nated by aftershock clustering suggests a rather com- 
plex network of active faults [Sornette, 1991]. The 
clustering of aftershocks on or close to the mainshock 
rupture probably reflects local stress concentration at 
asperities that served to stop the rupture and lock 
the fault. On the other hand, aftershocks also occur 
away from fault ruptures, due to various triggering 
mechanism, the simplest one being stress transfer and 
increasing Coulomb stress [King et al, 1994; Stein et 
al, 1994; Stein, 1999; Toda et al, 2002]. 

Both temporal and spatial properties argue for the 
presence of triggering processes, which could also be 
expected to lead to combined space and time depen- 
dence of the organization of aftershocks. Indeed, sev- 
eral studies have reported so-called "aftershock dif- 
fusion," the phenomenon of expansion or migration 
of aftershock zone with time [Mogi, 1968; Imoto, 
1981; Chatelain et al, 1983; Tajima and Kanamori, 
1985a, b; Ouchi and Uekawa, 1986; Wesson, 1987; Ry- 
delek and Sacks, 1990; Noir et al, 1997; Jacques et 
al, 1999]. Immediately after the mainshock occur- 
rence, most aftershocks are located close to the rup- 



ture plane of the mainshock, then aftershocks may in 
some cases migrate away from the mainshock, at ve- 
locity ranging from 1 km/h to 1 km/year [Jacques et 
al, 1999; Rydelek and Sacks, 2001]. This expansion 
is not universally observed, but is more important 
in some areas than in others [Tajima and Kanamori, 
1985a]. Most of these studies are qualitative and are 
based on a few sequences at most. Marsan et al. 
[1999, 2000] and Marsan and Bean [2003] have de- 
veloped what is to our knowledge the first systematic 
method to analyze space-time interactions between 
pairs of earthquake and report evidence for strong 
diffusion (see our discussion below on these results). 
A similar method has also been used by Hue and 
Main [2003]. The present state of knowledge on af- 
tershock diffusion is confusing because contradictory 
results have been obtained, some showing almost sys- 
tematic diffusion whatever the tectonic setting and 
in many areas in the world, while others do not find 
evidences for aftershock diffusion [Shaw, 1993]. This 
may reflect the intrinsic variability in space and time 
of genuine diffusion properties and also possible bi- 
ases of the analyses, for example, due to background 
scismicity. 

From a physical viewpoint, the diffusion of after- 
shocks is usually interpreted as a diffusion of the stress 
induced by the mainshock, either by a viscous relax- 
ation process [Rydelek and Sacks, 2001], or due to 
fluid transfer in the crust [Nur and Booker, 1972; 
Hudnut et al, 1989; Noir et al, 1997]. However, 
such a stress diffusion process is not necessary to ex- 
plain aftershock diffusion. Recent studies have in- 
deed suggested that aftershock diffusion may result 
from either a rate and state friction model [Dieterich, 
1994; Marsan et al, 2000] or a sub-critical growth 
mechanism [Hue and Main, 2003], without invoking 
any process of stress diffusion. Actually, aftershock 
diffusion is predicted by any model that assumes 
that (i) the time to failure increases if the applied 
stress decreases and (ii) the stress change induced by 
the mainshock decreases with the distance from the 
mainshock. Therefore, aftershocks further away from 
the mainshock will occur later than those closer to 
the mainshock, because the stress change is smaller 
and thus the failure time is larger. An increase of 
time to failure with the applied stress is predicted by 
many models of aftershocks, including rate and state 
friction [Dieterich, 1994], sub-critical crack growth 
[Das and Scholz, 1981; Yamashita and Knopoff, 1987; 
Shaw, 1993], damage or static fatigue laws [Scholz, 
1968; Lee and Sornette, 2000]. For instance, the sub- 
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critical growth model of [Das and Scholz, 1981] pre- 
dicts that the time to failure t c of an aftershock de- 
creases with the stress change da according to 

t c ~ d<j- n , (2) 

where n is the stress corrosion index. If the stress 
change decreases with the distance r from the main- 
shock according to 

da{r) ~ r- 1/2 (3) 

in the near field (at distances from the fault smaller 
than the mainshock rupture length) , then the average 
time to failure increases with r as 

t c (r) ~ r n ' 2 (4) 

By inverting r as a function of t, the typical distance 
R(t) of aftershocks occurring at time t after the main- 
shock is thus expected to increase according to [Hue 
and Main, 2003] 

R(t) ~ t H , (5) 

with H = 2/n. Expression (5) thus predicts a very 
slow sub-diffusion, with a diffusion exponent H = 
2/n w 0.06 for a corrosion index close to 30 as gen- 
erally observed [Hue and Main, 2003]. The diffusion 
exponent is even smaller at large distances from the 
mainshock because the stress decreases as ~ 1/r 3 and 
thus H = 1/ (3n) . If the time to failure decreases ex- 
ponentially with the applied stress, as expected for ex- 
ample for static fatigue laws [Scholz, 1968], the typical 
distance R increases logarithmically with time. Such 
a slow diffusion is difficult to observe in real data due 
to the limited number of events and the presence of 
background seismicity. Take H — 0.2 for instance: 
one needs five decades in time range to observe one 
decades in space range. The smallness of the diffu- 
sion exponent H predicted by this and other models 
(see Appendix A) explains why it may be difficult to 
observe it. 

Another alternative explanation of aftershock dif- 
fusion, that does not rely on any stress diffusion pro- 
cess, is multiple triggering. Ouchi and Uekawa [1986] 
first reported that the diffusion of aftershocks is of- 
ten due to the occurrence of large aftershocks and 
the ensuing localization of secondary aftershocks close 
to them. The apparent diffusion of the seismicity 
may thus result from a cascade process; the main- 
shock triggers aftershocks which in turn trigger their 
own aftershocks, and so on, thus leading to an expan- 
sion of the aftershock zone. A recent series of papers 



have investigated quantitatively this concept of trig- 
gered seismicity [Helmstetter and Sornette, 2002a, b,c; 
2003a; Helmstetter et al, 2003] in the context of the 
so-called epidemic-type aftershock (ETAS) model of 
earthquakes seen as point- wise events, which was in- 
troduced by Ogata [1988] and in a slightly different 
form by Kagan and Knopoff [1981; 1987]. The ETAS 
model assumes that each earthquake triggers after- 
shocks, with a rate that decays in time according to 
Omori's law, and which increases with the mainshock 
magnitude. The distribution of distances between 
triggered and triggering earthquake is assumed to be 
independent of the time. Using this model, Helm- 
stetter and Sornette [2002b] showed that, under the 
right conditions (see Appendix A), the characteris- 
tic size R of the aftershock cloud may increase as a 
function of time t since the mainshock according to 
expression (5), where H is a function of both the ex- 
ponent of Omori's law and of the exponent describing 
the spatial interactions between events (see Appendix 
A). Figure 1 presents results from numerical simula- 
tions of the ETAS model to show how cascades of 
multiple triggering can produce aftershock diffusion. 
The analysis of the ETAS model in [Helmstetter and 
Sornette, 2002b] offers some predictions that can be 
tested in real aftershock sequences. Diffusion should 
be observed only if the Omori exponent p is smaller 
than 1 , and the diffusion exponent H should decrease 
with p if p < 1. In addition to the diffusion law (5), 
the ETAS model also predicts a decrease of the Omori 
exponent p with the distance from the mainshock. 

The fundamental difference between the diffusion 
predicted by the ETAS model and by models based 
on stress weakening mechanisms is that, in the later, 
diffusion derives from the direct effect of the main- 
shock while, in the former, there is no diffusion of 
direct (first generation) aftershocks and diffusion re- 
sults from the cascade of secondary aftershocks. 

Motivated by these different empirical and theo- 
retical works, we revisit the issue of the existence of 
aftershock diffusion and of its characterization. For 
this, we develop two different and complementary 
methods to test these predictions. We analyze af- 
tershock sequences in California and interpret the re- 
sults in the lights of the predictions obtained from the 
ETAS model [Helmstetter and Sornette, 2002b] (see 
Appendix A for a brief description of the relevant pre- 
dictions). 

The organization of the article is as follows. We 
first discuss the problems encountered in previously 
published analyses of diffusion in real data. We then 
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explain and use a direct method of analysis of several 
aftershock sequences in California, followed by a sec- 
ond method based on wavelet analysis. This second 
method aims at optimizing the removal of the per- 
turbing influence of the scismicity background. We 
then discuss the limits of the theoretical and numeri- 
cal analysis, and interpret our empirical results in the 
light of various available models. 

Our analyses are carried out on two different cat- 
alogs, (i) the catalog of Southern California seismic- 
ity provided by the Southern California Seismic Net- 
work for the period 1932-2000, and (ii) the cata- 
log of Northern California seismicity provided by the 
Northern California Seismic Network since 1968. The 
minimum magnitude for completeness ranges from 
M = 3.5 in 1932 to M < 2 for the two catalogs 
since 1980. The average uncertainty on earthquake 
location is about 1 km for epicenters, but is larger for 
hypocenters. Therefore we consider only the spatial 
distribution of epicenters. 

2. Methodological Issues 

A priori, the observed seismicity results from a 
complex interplay between tectonic driving, fault in- 
teractions, different spatio-temporal field organiza- 
tions (stress, fluid, plastic deformation, phase trans- 
formations, etc.) and physical processes of damage 
and rupture. Notwithstanding this complexity, from 
the viewpoint of empirical observations, one may dis- 
tinguish two classes of seismicity: the seismicity re- 
sulting from the tectonic loading which is often taken 
as a constant source (uncorrelated seismicity) and the 
triggered seismicity resulting from earthquake inter- 
actions (correlated seismicity). An alternative clas- 
sification of scismicity defines earthquakes as either 
forcshock, mainshock, aftershock or background seis- 
micity. The definition of aftershocks and foreshocks 
requires the specification of a space-time window used 
to select foreshocks (resp. aftershocks) before (resp. 
after) a larger event defined as the mainshock. The 
background is the average level of seismicity prior to 
the mainshock or at large times after the mainshock. 
Of course, such classification is open to criticism in 
view of the inescapable residual arbitrariness of the 
selection of foreshocks, mainshocks and aftershocks 
[Helmstetter et at, 2003; Helmstetter and Sornette, 
2003]. Nevertheless, it offers a useful reference for 
testing what is measured in earthquake diffusion pro- 
cesses. 

Following previous studies, our analyses presented 



below assume that earthquake occurrence follows a 
point process, because we deal with the space and 
time organization of aftershock epicenters. Physically, 
this would be justified when all spatial and temporal 
scales are larger than source rupture dimension and 
duration. In reality, this is not the case and the point 
process representation may lead to incomplete or mis- 
leading conclusions, as we point out in section 6.3. 

2.1. Effect of uncorrelated seismicity 

The major problem when analyzing real seismic- 
ity data in search of some evidence for diffusion 
comes from the mixture of correlated seismicity which 
can display diffusion and of uncorrelated seismicity 
(noise). The latter can significantly alter the evalu- 
ation of the characteristic distance of the aftershock 
zone. To illustrate this problem, we analyze in Figure 
2 a synthetic catalog generated by superposing a large 
number of independent aftershock sequences, without 
adding a constant scismicity background. This syn- 
thetic catalog has been generated by superposing 1000 
mainshocks with a Poissonian distribution in time and 
space and with a Gutenberg-Richter distribution of 
magnitudes, and by adding aftershocks sequences of 
each mainshock, with a number of aftershock increas- 
ing as 10 8M with the mainshock magnitude M. If 
we study, as in previous studies, the average distance 
between all pairs of events in the catalog as a function 
of their average time difference, the superposition of 
uncorrelated aftershock sequences induces an appar- 
ent and spurious diffusion, over almost four orders 
of magnitude in time. This apparent diffusion comes 
from the superposition of a large number of aftershock 
sequences, with a power-law distribution of duration 
and spatial extension, resulting from the increase of 
the number of aftershocks and of the length of the 
aftershock zone with the mainshock magnitude, to- 
gether with a power-law distribution of earthquakes 
sizes (Gutenberg-Richter law). 

In order to study the diffusion of aftershocks, trig- 
gered directly or indirectly by the same source, one 
strategy is to study individual aftershock sequences 
of large earthquakes selected by some clustering al- 
gorithm, in order to remove the influence of uncorre- 
lated seismicity. This is the approach followed in the 
present paper. 

2.2. Discussion of the method of Marsan et 
al. 

In contrast, Marsan et al. [Marsan et at, 1999; 
2000; Marsan and Bean, 2003] have proposed a dif- 
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ferent method, which uses all available seismicity and 
considers all pairs of events independently of their 
magnitude, with the advantage that (1) the corre- 
sponding data set is much larger than for individual 
aftershock sequences and (2) their method does not 
require the sometimes arbitrary definition of main- 
shocks and aftershocks. In principle, their approach 
provides a systematic search of possible space-time 
correlations between earthquakes. However, there is 
a potential problem with their method stemming from 
the effect of the background seismicity and of uncor- 
rected seismicity, as illustrated in figure 2. Indeed, 
they study the average distance between all points 
as a function of the time between them. Therefore, 
their count is performed over a mixture of very differ- 
ent earthquake populations, a significant fraction of 
events being probably unrelated causally. 

In order to remove the influence of this uncorre- 
cted seismicity, they use the global catalog to esti- 
mate the average distance between two points, and 
they remove the contribution of the average seismic- 
ity to estimate the spatio-temporal distribution of the 
correlated seismicity. Specifically, Marsan et al. cal- 
culate a pair-wise time-dependent space-space corre- 
lation function corrected for the background by sub- 
tracting the long-time average value of this space- 
space correlation function, from which they estimate 
an average distance between pairs as a function of 
time. The growth of this average distance with time 
may then qualify a causal dependence between earth- 
quakes, through a diffusive process. Marsan and oth- 
ers [Marsan et al, 1999; 2000; Marsan and Bean, 
2003] studied in this way several catalogs at different 
spatial scales, from the scale of a mine to world-wide 
seismicity, and observed that the average distance be- 
tween two earthquakes increases as a power-law of the 
time between them, with an exponent often close to 
H = 0.2, indicative of a sub-diffusion process. They 
interpreted their results as a mechanism of stress dif- 
fusion, that may be due to fluid transfer with hetero- 
geneous permeability leading to sub-diffusion. 

In this sense, Marsan et al. do not try to establish a 
causal dependence between earthquakes but rather to 
detect a correlation in their spatio-temporal organi- 
zation. Recall that correlation shows that two events 
are related, but it does not determine their cause and 
effect relationship. This is because there are basi- 
cally three possible explanations for the observation 
of correlation: (i) The correlation is a coincidence; (ii) 
One event causes the other; (iii) The two events are 
both caused by some third event. In their detection 



of spatio-temporal correlations in seismicity, Marsan 
et al. do not distinguish between explanation (ii) nor 
(iii) while in contrast the present paper emphasizes 
explanation (ii). 

We have performed many tests of Marsan et al.'s 
method on synthetic catalogs. A typical test is shown 
as the diamond symbols in figure 2. At early times 
(t < 1 day), the average distance between earthquake 
pairs is constant, as it should be, and the method re- 
moves adequately the influence of uncorrelated seis- 
micity. But at large times when aftershock activity 
is small, the average distance exhibits large fluctua- 
tions as a function of time. This is due to the cor- 
rective term which becomes ill-conditioned at large 
times and leads to a large sensitivity to noise and fi- 
nite sample sizes. We have found that, in most of our 
synthetic catalogs constructed without genuine diffu- 
sion, the average distance obtained by Marsan et al.'s 
method is approximately constant at early times as it 
should, but then crosses over to another noisy plateau 
at long times, as a result of the ill-defined correction 
term. This noisy crossover from a constant to another 
higher plateau obviously gives an apparent growth of 
the average distance between pairs of earthquakes as 
a function of time, which may compete with or hide 
a genuine signal. It seems to us that several of the 
figures on the time dependence of the average dis- 
tance, which are presented in [Marsan et al, 1999; 
2000; Marsan and Bean, 2003], exhibit this feature of 
a more or less constant or very weak growth followed 
by a more abrupt jump to another noisy plateau. If 
one interprets Marsan et al. 's results in terms a causal 
diffusion processes, the evidence on diffusion provided 
in [Marsan et al, 1999; 2000; Marsan and Bean, 2003] 
is only suggestive and additional tests of the methods 
on synthetic and real data should be performed in or- 
der to understand its effects on the analysis of real 
data and to improve the correction term. Marsan et 
al.'s analysis is perhaps better interpreted as evidence 
of a spatio-temporal correlation of seismicity result- 
ing from viscous relaxation, stress pulses and other 
processes operating at large scales. But then, their 
analyses should not be interpreted using models of 
causal diffusion based on rate- and state-dependent 
friction law [Marsan et al, 2000] or on sub-critical 
crack growth [Hue and Main, 2003] , that model only 
mainshock-aftershock pairs and not arbitrary pairs of 
earthquakes. 
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3. Windowing method 
3.1. Method 

In this study, we analyze individual aftershock se- 
quences and consider the diffusion of the seismicity 
triggered directly or indirectly by the mainshock. We 
adjust the values of the time window T and the space 
window D used to select aftershocks so that the rate 
of background activity is negligible in comparison 
with the aftershock rate. The background seismic- 
ity rate is estimated by the average seismicity before 
the mainshock. We also adjust the minimum magni- 
tude m and the minimum time t min after the main- 
shock in order to obtain a catalog that is complete for 
tmin <t<T and m> m . 

In order to estimate the average size R of the after- 
shock area as a function of the time from the main- 
shock, we define the barycenter of the aftershock se- 
quence as the reference point because the mainshock 
epicenter has no reason a priori to be the source of 
the aftershock sequence. The average size of the af- 
tershock area R is obtained as the average distance 
between the aftershocks and the barycenter. We find 
that this procedure yields a diffusion exponent H r 
that is always a little larger than the diffusion expo- 
nent estimated from the average distance between the 
aftershocks and the mainshock epicenter, as done in 
Marsan et al. [1999; 2000], Marsan and Bean [2003] 
and Hue and Main [2003]. We also introduce a re- 
finement to take into account the anisotropy of the 
aftershock zone and the spatial extension of the main- 
shock rupture. For this, we compute the axes a and b 
of inertia of the whole aftershock sequence as a func- 
tion of the time after the mainshock, in the spirit 
of Kagan [2002]. Geometrically, this corresponds to 
approximating the map of aftershocks as filling an el- 
lipse with small axis b and large axis a. This provides 
two additional diffusion exponents H a and H^. For 
strike-slip mainshocks, b(t) measures the average dis- 
tance between aftershocks and the fault plane, while 
the large axis gives the average distance between af- 
tershocks and the barycenter on the fault plane. 

We also measure the Omori exponent by plotting 
the rate of aftershock activity as a function of time 
in a log-log plot, and by measuring the slope p by a 
linear regression for t m i n < t < T. We have also used 
a maximum likelihood method to estimate both the 
p and c values of the modified Omori law. In most 
cases, the two methods provide similar values of p. 
We also estimate the variation of p with the distance 
r between the mainshock and its aftershocks by se- 



lecting aftershocks at different distances between the 
mainshock. As described in Appendix A, a prediction 
of the ETAS model concerns the modification of the 
distribution of distances r between aftershocks and 
the mainshock epicenter with time. We plot the dis- 
tribution of distances r between the mainshock and 
its aftershocks for several time windows to test if there 
is an expansion of the aftershock area with time. 

We have tested this method using synthetic cata- 
logs generated with the ETAS model, including a con- 
stant seismicity background. We have checked that 
our method provides a reliable estimate of the diffu- 
sion exponent and is almost insensitive to the back- 
ground activity as long as the duration of the time 
window is sufficiently short so that the seismicity is 
dominated by aftershocks. 

3.2. Results 

We have analyzed 21 aftershock sequences of ma- 
jor earthquakes in California with number of after- 
shocks larger than 500, in addition to two aftershock 
sequences of large earthquakes (Kern-County, M=7.5, 
1952, and San-Fernando, M=6.6, 1971) which have 
less than 500 events. The results for all these 21 se- 
quences are listed in Table 1. The different values 
of H, measured either by the average distance from 
the barycenter (H r ) or from the inertial axes (H a and 
Hi,), generally give similar results. Exceptions are the 
Morgan Hill, Loma-Prieta, and Joshua- Tree events 
which have Ht > H ai and the Imperial Valley event 
which gives Hb < H a . The diffusion exponent H is 
always positive, and generally smaller than 0.1. The 
estimated standard deviation on H, measured for syn- 
thetic catalogs with the same number of events and 
similar p-values, is about 0.05. Thus, most aftershock 
sequences do not exhibit significant diffusion. 

Details of the analysis for the aftershock sequence 
following the 1992 M — 7.3 Landers event are shown 
in Figures 3 and 4. For this sequence, we measure an 
Omori exponent p = 1.1, which is stable when look- 
ing at different distances r. The characteristic cluster 
size is also stable over more than two orders of mag- 
nitude in time leading to H w 0. Similar results are 
obtained for the elliptical axes a and b. The analysis 
of the distance distribution at different times also con- 
firms that there is no discernible diffusion of seismic 
activity. 

Figure 5 summarizes the results for p and H listed 
in Table 1. The first result we should stress is that 
all our values of the diffusion exponent H are quite 
small when compared with previous studies [Marsan 
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et at, 1999; 2000; Marsan and Bean, 2003]. For the 
reasons explained in section 2, we believe that Marsan 
et al.'s results may have been affected by the back- 
ground seismic activity, and are quantifying different 
processes. One can observe a weak negative corre- 
lation between p and H for p < 1 but this negative 
correlation disappears when including data with p > 1 
which gives a very strong scatter with two among the 
largest H obtained for the largest p > 1 . 

4. Wavelet method 

The detection of aftershock diffusion may be bi- 
ased (i) by other (possibly) independent aftershock 
sequences and (ii) by background seismicity. In the 
windowing method, we have addressed these prob- 
lems by analyzing (a) individual aftershock sequences 
of large earthquakes over a space-time window (b) 
with seismicity rate much larger than the background 
estimated over a ten-year average. 

We now present a second method based on wavelet 
analysis, which introduces two innovations. First, 
it uses a smoothing kernel (or mother wavelet) con- 
structed in order to remove Poissonian background 
seismicity. Second, it uses the expected scaling re- 
lationship relating space and time that characterizes 
diffusion to derive scaling laws obeyed by the wavelet 
coefficients. The exponents p of the observed Omori 
law and H of aftershock diffusion are then obtained by 
optimizing the compliance of the wavelet coefficients 
to these scaling laws. The wavelet approach displays 
the data all at once in the space and time-scale dimen- 
sions in order to determine p and H simultaneously. 
In contrast, the windowing method displays the data 
twice, once in the time dimension to obtain the ex- 
ponent p and a second time to obtain R(t) and the 
exponent H. The wavelet method can thus be seen 
as a simultaneous two-dimensional time scale-space 
determination of p and H, in contrast with the previ- 
ous windowing method consisting in two independent 
one-dimensional time series analyses, one for p and 
the other for H. 

4.1. General diffusion scaling relation 

In the following, we will assume that a main event 
occurs at time t = 0, and is followed by a cascade 
of aftershocks, superimposed on the long term back- 
ground seismicity rate, which is for now considered 
as a set of Poissonian, independent events. We thus 
neglect the possible interactions between the back- 
ground noise and the cascade events, and that inde- 



pendent events trigger their own cascades. We will 
come back to this point at the end of the derivation 
and show that it is a reasonable assumption from a 
geophysical point of view for short catalogs such as 
those we consider. At any time t following the main- 
shock, the spatial and temporal evolution of the seis- 
micity rate n(r, t) can be approximated by 

n(r, t) dr dt = A{r, t) dr dt + B(r, t) 2irr dr dt , (6) 

where r is the spatial distance from the mainshock 
epicenter. A(r, t) is the aftershock rate per unit time 
and per unit distance from the mainshock: A(r,t)dr 
is thus the number of events per unit time at time t 
within an annulus of radii r and r + dr. In the follow- 
ing, a Poissonian distribution will be assumed for the 
temporal structure of the background seismicity rate 
B(r,t) per unit time and surface and no restrictions 
will be imposed on its spatial structure. This is an 
interesting aspect of the present wavelet method. 

The term A(r, t) reflects the spatio-temporal struc- 
ture of the Omori law, describing the relaxation of the 
stress and of other physical fields, which occurs in the 
vicinity of the main source and beyond. The qualify- 
ing signature of diffusion is expressed by the following 
general scaling relationship coupling time and space: 

A(r, t)drdt = Q^jf (^) drdt , (7) 

where H is the diffusion exponent, p is the Omori 
law exponent and the diffusion constant D is such 
that Dt H has the dimension of a length. Q is a 
constant and / is an integrable function which de- 
pends on the physics of the diffusion processes. The 
1 /t p+H prcfactor stems from the requirement that the 
integral of A(r, t) over all space should retrieve the 
Omori law ~ l/t p . For H = 0, no diffusion occurs, 
while the value H = 1/2 gives standard diffusion. 
H > 1/2 characterizes a superdiffusive regime and 
H < 1/2 corresponds to a subdiffusive regime which 
is the regime relevant to aftershocks. 

The problem is that the background term B(r,t) 
in (6) is not of the form (7) and therefore may spoil 
the detection of diffusion. In other words, the spatial 
and temporal structure of B(r, t) scrambles the signal 
A(r, t). We now describe the wavelet approach that 
addresses this problem by minimizing the impact of 
B(r,t). 

4.2. Kernel smoothing in time 

Introducing the temporal kernel or mother wavelet 
W(t), we consider the wavelet transform of the signal 
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N(R,t) (where N(R,t) is the spatial integration of 
n(r, t) for r between and R) computed at time t = 
and time scale a using the wavelet W 



C a{ R) = ll° N { R,t)w{^ dt 



(8) 



As each event can be considered as a Dirac function 
in time, the integral in (8) reduces to a discrete sum. 
The scale factor a allows us to dilate or contract the 
kernel W in order to get insight into the temporal 
structure of N(R, t) at various time scales a. 

We use a kernel with zero average so that any sta- 
tionary process S(R, t) uncorrelated in time does not 
contribute to C a (R): 



dt = 



(9) 



Here, S(R,t) is the rate of background events oc- 
curring within a circle of radius R at time t. This 
property allows us to get rid of the background seis- 
micity without presuming anything about its spa- 
tial structure, nor about the specific time occurrence 
of such events. In this way, background events are 
erased on average without needing to identify them 
in the seismic catalog. This is an important quality 
of the present wavelet analysis compared with previ- 
ous studies of aftershocks. Strictly speaking, for the 
background seismicity to disappear by this procedure, 
we need (i) to consider large mainshocks, (ii) to have a 
small probability that a large event due to the back- 
ground is generated during any aftershock sequence 
and (iii) to assume that the number of events in the 
triggering cascade generated by any background event 
is low and that all these events have small magnitudes. 
These conditions ensure that the triggering cascades 
resulting from the background S(R, t) do not interact 
with the cascade of aftershocks induced by the main- 
shock and that the duration of S(R, i)-induced cas- 
cades is short compared with any time scale a used 
in the wavelet analysis. These conditions would be 
too restrictive if applied to the whole span of a seis- 
mic catalog, but we expect them to be approximately 
realized over the short time span of each aftershock 
sequence whose duration is generally observed to be 
no more than the order of weeks to years. 

Now, taking the wavelet transform of A(r, t) given 
by (7) gives 



C a (R) 



= QDa- p f 
Jo 



t-p G 



R/a H 



W(t) dr , 
(10) 



where G is the integral of / and t = t/a. Expression 
(10) implies the scaling law 



C a (R) =a- p d(R/a H ) 



(11) 



which relates the wavelet coefficient at time scale a 
of the time series of earthquake rate within a circle 
of radius R centered on the mainshock to the wavelet 
coefficient for time scale 1 and radius R/a H , by a 
simple normalization by a~ p . We will refer hereafter 
to this scaling law as the iJ-scaling law. This law (11) 
implies that, for any possible different values of a and 
R, plotting a p C a (R) as a function of R/a H leads to a 
collapse of all points onto a single "master" curve. 
Expression (11) can be transformed into 



R*C a (R) = C aR - 1/H (l) 



(12) 



which now provides a relationship between the wavelet 
coefficient computed at time scale a and radius R 
and the wavelet coefficient computed at time scale 
aR~ x ^ H for a unit radius. This scaling law (12) will 
thereafter be referred to as the 1/iJ-scaling law. Ex- 
pression (12) shows that, for any possible different 
values of a and R, plotting RuC a (R) as a function of 
aR-i/H i eac [ s t Q a collapse of all points onto another 
single "master" curve. 

Both scaling laws (11) and (12) capture mathe- 
matically in a universal way the possible diffusion of 
aftershocks around their mainshock. These scaling 
laws give access to the diffusion exponent H but do 
not say anything on the shape of the "master" curves, 
which should derive from the specific properties of 
f(r/Dt H ) and of W{t). The interest in using the two 
scaling laws (11) and (12) is that they magnify and 
thus stress differently the small and long time and 
well as the short and large distance part of the data. 

The 1 ///-scaling law (12) must have a master curve 
with two asymptotes. If R S> Da H (i.e., for small 
normalized time scales compared to the radius), it 
can easily be shown that the master curve is a power- 
law of exponent —p. This result is compatible with 
the exponent expected for the whole sequence, that is, 
for the Omori law describing the decay of the seismic 
rate of all events within a circle of infinite radius R. 
If R <C Da H , then the master curve is again a power 
law, but with exponent — (p + H) . The computation 
of C a {R) for small and large i?'s should thus provide 
p and H . We will use below a more sophisticated 
method that uses any range of R values. 
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4.3. Choice of the smoothing kernel 

Our choice of the kernel or mother wavelet W has 
been dictated by the following considerations. First, 
the modified Omori law (1) as well as the ETAS model 
(16) introduce a short-time cut-off c that accounts for 
seismicity just after the mainshock. Since this cut-off 
breaks down the exact self-similarity of the Omori 
kernel, it leads to corrections to the expected diffu- 
sion resulting from event triggering cascades at short 
times [Helmstetter and Sornette, 2002b]. Since c is in 
general found small, the seismicity rate is very large 
just after the mainshock which may lead to finite- 
size and instrumental saturation effects just after the 
mainshock. In addition, even if these limitations are 
removed there may be so many aftershocks near t = 
that many of them are generally interwoven in seis- 
mic recordings, and most of them can not be inter- 
preted and located properly. We thus construct a 
wavelet kernel W which minimizes the weight given 
to the seismicity rate at short times after mainshocks. 
Specifically, we choose the wavelet kernel shown in 
Figure 6, which vanishes at t = 0, has also zero time 
derivative at the origin and has zero average: 

W(t) = (M 2 -t*)cM-j)- (13) 

This kernel looks like an aliased sine function. This 
wavelet kernel is reasonably well localized both in 
time and scale and its simple expression allows for 
fast computations on large data sets. The time unit 
shown in this figure 6 has no special meaning since our 
wavelet analysis is using scale ratios rather than on 
absolute scales. Note that the chosen wavelet kernel 
has properties quite different from those, for instance, 
of the classical Mexican hat wavelet, widely used in 
the analysis of singularities, which has a maximum 
amplitude at t — 0, and would thus be inappropriate 
for the reasons listed above. 

This choice of the wavelet kernel leads to the decay 
of the wavelet coefficients C a {R) ~ 1/a 3 at large times 
after the last events in the catalog, thus giving an 
apparent Omori exponent p = 3. As we know by 
experience that the Omori exponent is smaller than 2, 
an exponent p = 3 will thus be interpreted as spurious 
and due to the limited duration of the catalog. This 
asymptotic property, which is very different from the 
Omori law we are studying, is a desirable property of 
the wavelet kernel (13) which provides a clear signal of 
a possible problem in the data analysis. For example, 
if no event is present in the catalog between times t\ 
and t 2 , and provided that t 2 is sufficiently larger than 



t\, then we will measure a spurious p = 3 for time 
scales o larger than t\ and lower than t 2 . This may 
lead to some bias in the determination of p, that we 
shall come back to in the discussion. 

4.4. Synthetic tests 

The determination of the exponents p and H is 
performed by two algorithms described in Appendix 
B. This first (respectively second) algorithm uses the 
l/fZ-scaling (respectively if -scaling law) normalized 
curves. 

We have performed tests of the wavelet method on 
synthetic catalogs generated using the ETAS model 
and various modifications thereof, with and without 
genuine diffusion and with or without background 
seismicity. A particular result is that the wavelet 
method works better the larger the background noise, 
up to the point where the meaningful signal would 
disappear. While for large catalogs with a signifi- 
cant background component, the wavelet method is 
superior to the windowing method by providing more 
precise values of p and H, it turns out to be in- 
ferior to the windowing method for synthetic cata- 
logs without background when catalogs have limited 
sizes, as it exhibits significant scatter in the determi- 
nation of the exponents p and H. Technically, this 
scatters occurs due to the existence of large oscilla- 
tions in the dependence of the wavelet coefficients as 
a function of R and/or a, which makes difficult the 
determination of the relevant scaling intervals. This 
paradoxical result reflects the larger sensitivity of the 
wavelet method to the size of catalogs due to its in- 
trinsic two-dimensional nature, while the windowing 
method is more robust for small catalogs due to its 
one-dimensional structure. 

Comparing the if-method with the 1/if-method, 
we note that, due to the fact that the iJ-method uses 
original curves and not their fit approximations (and 
are thus more subjected to fluctuations), the "vari- 
ance landscapes" defined in Appendix B exhibit much 
more elongated valleys around minima than with the 
1/if-mcthod. As will be shown below, both meth- 
ods most often yield the same results, but the 1/H- 
method yields better-defined minima and should be 
preferred in general. 

4.5. Results 

Table 2 summarizes the results on the values of 
p and H obtained from the wavelet analysis applied 
to 21 large earthquakes in California. These events 
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are the same as those studied with the windowing 
method and reported in Table 1. Table 2 shows that 
results obtained with the two methods based on the 
_ff-scaling law and 1/ii-scaling law compare rather 
well, except for four cases: Westmorland, Round Val- 
ley, North Palm Springs, Mammoth Lakes. However, 
for each earthquake, a detailed analysis of the cor- 
responding contour lines of the average variance of 
the collapse of wavelet coefficients (see Figure 7 for 
the Round Valley mainshock) shows that the 1/H- 
scaling law method is in general more reliable with a 
more constrained and better defined minimum. We 
thus tend to believe the p and H values given by the 
1/ii-method as more reliable. 

Figure 8 shows the correlation between the expo- 
nents p and H obtained for each shock. There is 
a good consistency between both methods. Strictly 
speaking, negative values of H are associated with 
"anti-diffusion", that is, migration of aftershocks to- 
wards the mainshock. One possible reason for our 
finding of negative £Ts is a spurious bias due to the 
mismatch between the mainshock epicenter and the 
barycenter of the aftershock cloud. 

5. Discussion 

5.1. General synthesis 

Our main conclusion is that, in contrast with pre- 
vious claims, diffusion of aftershocks is in general ab- 
sent or very weak, at the borderline of detection. This 
conclusion is reached notwithstanding our significant 
efforts to develop two independent techniques which 
have been optimized for extracting a signal on diffu- 
sion in the presence of background noise. Maybe, to 
be fair, we should state more correctly that our rather 
negative conclusion is reached precisely because we 
have made large efforts to remove spurious signals. 
As many tests have shown, some of which presented 
in section 2, it is easy to construct a diffusion sig- 
nal in the form of a power law of the characteris- 
tic size of the aftershock cloud as a function of time. 
Our present work has stressed the importance of not 
jumping to conclusions and that serious tests should 
be performed to assess the reliability of a putative 
diffusive power law. The simplest explanation is of- 
ten that such a power law is due to cross-over effects 
in the presence of inhomogeneity of the catalogs, of 
their limited sizes and of the contagion induced by 
background and uncorrelated seismicity. We also note 
that most sub-diffusion exponents H reported in pre- 
vious studies as well as in the present one are very 



small, in the range 0.05 — 0.2. For instance, a value 
H — 0.1 implies that ten decades in time scales are 
needed for each decade in space scale. This "small 
exponent curse" is one of the many explanations for 
the difficulty in obtaining a clear-cut diffusion signal. 

Figure 14 compares the values of p and H obtained 
by the two methods given in Tables 1 and 2. The re- 
sults for H are essentially uncorrelated between the 
two methods. Considering they belong to the same af- 
tershock sequences, this supports the conclusion that 
the diffusion coefficients are not on the whole statis- 
tically significant. 

Having said that, Tables 1 and 2 show that a few 
aftershock sequences exhibit a significant and unam- 
biguous diffusion. In order to aggregate the informa- 
tion derived from the windowing and wavelet meth- 
ods, we compare the exponent Hb for the former 
and the exponent H obtained from the 1/iJ-scaling 
method for the latter and qualify the existence of dif- 
fusion (somewhat arbitrarily) when the two criteria 
Hb > 0.1 and H\ju > 0.05 are simultaneously ver- 
ified. Comparing these two methods, six clear-cut 
cases emerge: Westmorland (Hb = 0.12, H 1 / H = 
0.10), Morgan-Hill (H b = 0.44, H 1/H = 0.08), Round- 
Valley (H b = 0.11, H 1/H = 0.24), Superstition-Hill 
(H b = 0.10, H 1/H = 0.06), Joshua Tree (H b = 0.27, 
H 1/H = 0.08) and Mammoth Lakes (H b = 0.16, 
H\/h = 0.20). For the other sequences, either both 
windowing and wavelet analyses give a very small H 
or they strongly disagree with each other. The rea- 
son for the disagreement between the two methods 
when it occurs is not obvious to us. With a sin- 
gle method, one can quantify its systematic errors. 
Using two distinct methods and comparing them en- 
ables us to quantify the uncertainties resulting from 
causes that are difficult to assess a priori. This is our 
main justification for developing two distinct methods 
and for comparing their results. We note that such a 
strategy of using several models emphasizing different 
physical mechanisms with distinct implementations is 
well-known and largely used in meteorological fore- 
casts, precisely with the aim of accounting for the 
unknown or non-understood sources of uncertainties. 

It may be helpful to put these results in the light 
offered by the ETAS model [Helmstetter and Sornette, 
2002b] whose main predictions are summarized in Ap- 
pendix A. The most robust prediction is that one 
should expect aftershock diffusion when the Omori's 
exponent p is less than 1, because this value signals 
the existence of a cascade of triggering which is the 
mechanism at the origin of diffusion in the ETAS 
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model. In the above list of 6 clear-cut cases, three 
(Morgan-Hill, Round- Valley and Mammoth Lakes) 
have an exponent p smaller than 1 according to both 
methods, while the three others have a p- value larger 
than or very close to 1 . This fifty-fifty deadlock seems 
to discredit the usefulness of the ETAS model for this 
problem. It may be useful to look in more detail at 
the results of each method separately. 

5.2. Discussion of the results in the light of 
the ETAS model 

Most sequences shown in Table 1 for the window- 
ing method and in Table 2 for the wavelet method are 
in the regime p > 1 and are characterized by H « 0. 
These results are compatible with the predictions of 
the ETAS model that no diffusion should be observed 
if the Omori exponent is larger than 1. Indeed, a 
measured value p > 1 can be interpreted as belonging 
to the sub-critical regime v < 1 (where v is the aver- 
age number of events triggered per triggering event) 
such that the characteristic time t* defined by (17) 
is small and the relevant time window covers mostly 
the regime t > t* for which the cascade process is 
exhausted and diffusion is absent (Appendix A). 

As seen in Table 1 for the windowing method and 
in Table 2 for the wavelet method, a few aftershock se- 
quences are characterized by a small p < 1 exponent. 
As we have said, according to the ETAS model (see 
Appendix A), this is the relevant regime for observing 
aftershock diffusion. 

The Morgan-Hill sequence, analyzed in Figures 9 
and 10 and which has the smallest p- value with both 
the windowing method (p = 0.6) and the 1/H- wavelet 
method (p = 0.5), has a small but significant diffusion 
exponent measured with the windowing method. We 
obtain H r « H a « 0.1 estimated by the aftershock 
distances from the barycenter or the large elliptical 
axis, and a large value Hb = 0.44 obtained by using 
the time evolution of the small elliptical axes b which 
measures the average distance of aftershocks from the 
rupture plane. The value of H — 0.08 obtained by 
the wavelet l/7?-method is similar. The larger value 
Hb = 0.44 could be due to the fact that the diffu- 
sion perpendicular to the fault is less perturbed by 
the aftershocks along the whole length of the fault 
which occur in absence of genuine diffusion. For this 
aftershock sequence, the empirically determined val- 
ues of p = 0.6 and Hb = 0.44 would correspond for 
instance to 6 « 0.4 and fi w 1 (see equation (20)) 
according to the ETAS model [Helmstetter and Sor- 
nette, 2002b], where 9 and /j, are defined in equation 



(16) in Appendix A. 

The other sequences in Table 1 with p < 1 for 
which the cascade model predicts the existence of dif- 
fusion are Kern-County, Round- Valley, Oceanside and 
Mammoth Lakes (see table 1 for the corresponding 
parameters). Except for Oceanside, the correspond- 
ing p and Hb values are compatible with the predic- 
tion of the ETAS model [Helmstetter and Sornette, 
2002b] with fj, « 1 leading to H b w 1 -p. While these 
results are suggestive for the validity of the ETAS pre- 
dictions, more disturbing is the fact that large values 
of the diffusion exponents H are found for p > 1 (see 
Table 1). The Imperial Valley sequence is a case in 
point, with the largest p = 1.44 and large diffusion 
exponents H r = 0Al,H a = 0.37 and H b = 0.19. Its 
detailed analysis is shown in Figures 11 and 12. For 
this sequence, one can clearly observe an expansion 
of the aftershock area when comparing the distance 
distribution at different times. There is also a clear 
decrease of the exponent p with r as predicted by 
the ETAS model as a signature of the aftershock cas- 
cade leading to diffusion, but this should be associ- 
ated with p < 1 and not with p > 1 as found here. 
It is true that a significant diffusion exponent H > 
with p > 1 can be observed in the ETAS model in the 
crossover regime for t « t* where p is already larger 
than 1, but where a diffusion of seismic activity is 
still observed. Indeed, synthetic aftershock sequences 
in the sub-critical regime exhibit a diffusion of seis- 
mic activity which persists up to t w 100 t* even if 
the Omori exponent in larger 1. But the diffusion ex- 
ponent H in the crossover regime for t w t* should 
be smaller than in the early time t < t* regime when 
p is smaller than 1. We thus do not fully understand 
the origin of this discrepancy. 

Using the wavelet method, only five sequences are 
in the regime p < 1 where diffusion is predicted to oc- 
cur according to the ETAS model and, there, the ex- 
pected correlation is weak and noisy. The evidence of 
aftershock diffusion is very weak as 75% of the after- 
shock sequences seem to be in the non-critical regime 
(t > t* and p > 1) characterized by an absence of dif- 
fusion. The remaining five sequences are loosely com- 
patible with the existence of diffusion and the quan- 
titative values are consistent with the predictions of 
[Helmstetter and Sornette, 2002b], when taken col- 
lectively. Figure 13 tests the possible existence of a 
correlation between the diffusion exponent H and the 
Omori law exponent p obtained with the 1/H method 
for all the aftershock sequences described in table 2. 
The thick lines are the prediction of the ETAS model 
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[Helmstetter and Sornette, 2002b] that H « for 
p > 1 and i7 = (1 — p)/2 for p < 1, obtained by 
choosing [J. > 2 and by assuming p = 1 — 6 (which is 
correct if t <C t*). In the five cases with p < 1, we find 
positive exponents H in the range from 0.08 to 0.24. 
A linear fit over these five events with p < 1, forced 
to pass through the "origin" (p = l,H = 0) gives 
H = (p — l)/2.52, which must be compared with the 
prediction H = (1 — p)/2. In the other 15 cases with 
p > 1, H is very noisy with almost as many negative 
and positive values in the range from —0.15 to 0.1, 
and almost vanishing average. 

In conclusion, the results obtained with the two 
methods do not show any significant correlation be- 
tween H and p, in contradiction with the prediction 
of the ETAS model. This disagreement between the 
observations and the theory may result from the in- 
sufficient number of aftershocks available. The small 
number of events used yields a large uncertainty of the 
measure of H and p, as shown by the large difference 
in the value of p and H obtained for some sequences 
using the different methods (windowing, H and 1/H 
wavelet method). In addition, these are other factors, 
such as the fact that the typical mainshock-aftcrshock 
distance increases with the mainshock magnitude, or 
the geometry of the rupture, which are taken into ac- 
count neither in the ETAS model nor in the empirical 
analysis, but which may significantly alter the results. 

6. Factors limiting the observation of 
diffusion 

This section examines important limitations of 
both the theoretical analysis of Helmstetter and Sor- 
nette [2002b] and our present study of California seis- 
micity and discusses possible remedies. 

6.1. Independence between the mainshock 
size and the aftershock cluster size and 
selection rules 

A limitation of the analytical approach developed 
in [Helmstetter and Sornette, 2002b] is that the distri- 
bution of distances between a mainshock and its af- 
tershocks are assumed independent of the mainshock 
magnitude. However, it is a well established property 
of aftershock sequences that the size of the aftershock 
area is approximately proportional to the mainshock 
rupture length [Utsu, 1961; Kagan, 2002]. 

We can modify the ETAS model to include a 
dependence between the mainshock magnitude and 
the aftershock size, as observed in real seismicity, 



in order to take into account the extended rupture 
length of the mainshock. In this goal, we modify 
the distance distribution ~ l/(r + defined in 

(16) of Appendix A by taking the characteristic size 
d(M) proportional to the mainshock rupture length 
d(M) oc L ~ 10 5M . This means that an after- 
shock of generation i is created by an aftershock of 
generation i — 1 of magnitude Mj_i at a distance r 
from it via the rate (16) with the distance distribution 
~ l/ir + diM^y+v. 

We can understand intuitively the effect of intro- 
ducing a dependence between d and the magnitude, 
knowing the solution N(t, r) given by Helmstetter and 
Sornette [2002b] for a constant <i-value. Starting from 
a mainshock of magnitude M at the origin, the typ- 
ical distance between aftershocks of the first genera- 
tion and the mainshock is d(M) oc L, where L is the 
rupture length of the mainshock. Aftershocks of sec- 
ond and later generation have a smaller magnitude on 
average, otherwise they do not qualify as aftershocks 
of the first event, by our present definition. We can 
thus define an average rupture size (£) < L for the 
aftershocks. As long as the average size of the af- 
tershock zone R(t) is smaller than L, the aftershocks 
of the first generation dominate the spread of seis- 
mic activity following the mainshock and diffusion is 
not observable. At large times, when the aftershock 
area of second and later generations becomes larger 
than the influence zone of the mainshock of size L and 
when most aftershocks are triggered indirectly by af- 
tershocks of the mainshock, the diffusion for the cou- 
pled model should be the same as for the decoupled 
model studied in [Helmstetter and Sornette, 2002b]. 
The dependence of d with the progenitor's magnitude 
thus introduces a crossover time r such that there is 
no significant diffusion below r because the aftershock 
spatial expansion is dominated by aftershocks of the 
first generation while for time larger than r the multi- 
ple cascades of aftershocks dominates the spatial ex- 
cursion of aftershocks and we recover the diffusion law 
(5) of the decoupled model. This gives the equation 
for r: 

(l)(l) H *L, (14) 

leading to 

'-y <i5> 

where L is the mainshock size and (£) is the average 
rupture length of aftershocks. For large mainshocks 
L ^> {£} and slow diffusion, this characteristic time r 
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can be very large and, in practice, diffusion may be 
unobserved in real aftershock sequences. 

6.2. Rules of aftershock selection 

There are two other factors that may further weaken 
the observed diffusion in comparison with the predic- 
tion of the ETAS model given in [Helmstetter and 
Sornette, 2002b]. First, the constraint that the after- 
shocks must be smaller than the mainshock induces 
an undcr-estimation of the true diffusion exponent. 
If we include this rule of aftershock selection in the 
numerical simulations, the average seismicity rate is 
smaller than the predicted seismicity rate and de- 
creases as 1/ t 1+e at large times instead of the theoreti- 
cal prediction N(t) ~ 1/t 1 - for t < t* or n = 1. This 
effect induces a slower diffusion at large times when 
the observed Omori exponent becomes larger than 1. 
The other factor which may lower the diffusion in the 
real aftershock sequence is the introduction of a max- 
imum distance r max for the selection of aftershocks 
close to the mainshock. Introducing this rule in the 
simulations of the ETAS model obviously lowers the 
diffusion exponent by comparison with the true ex- 
ponent, because it rejects all aftershocks triggered at 
large distance from the mainshock which have an im- 
portant role in the diffusion process, especially for 
small value of fi < 2. The effect of the rupture geome- 
try and the rules of aftershock selection are illustrated 
in Figure 15 for numerical simulations of the ETAS 
model obtained for a = 0.5, b = 1, n = 1, n = 1, 
c = 0.001 day, m a = 0, M = 6, d = 0.01 x 10 5M 
km. This Figure shows that the combined effect of 
the dependence between the influence distance d and 
the rupture length L, and the rules of aftershock se- 
lection results in a strong reduction of the diffusion in 
comparison with the predictions given in [Helmstetter 
and Sornette, 2002b]. 

6.3. Influence of the fault geometry 

Another important limitation of the ETAS model 
and of other models of aftershocks such as the rate- 
and-state friction model [Dieterich, 1994] is that these 
models do not take into account the anisotropy of the 
spatial distribution of aftershocks nor the localization 
of earthquakes on a fractal fault network. These fac- 
tors may also have an important influence on after- 
shock diffusion. 

There is strong evidence that earthquakes occur 
on faults and faults are organized in a complex hi- 
erarchical network [Ouillon et at, 1996]. A simpli- 
fied representation of this network uses fractal geom- 



etry A better description of aftershock diffusion, if 
any, should thus deal with the problem of diffusion 
on a fractal network. Some general results have been 
obtained in the physical literature on this problem 
of diffusion of probe particles in non-Euclidean frac- 
tal spaces (see Bouchaud and Georges [1990], Sahirai 
[1993; 1994] and references therein). The main conse- 
quence is that the diffusion exponent H may be modi- 
fied to take into account the fractal geometry through 
the introduction of the so-called spectral dimension 
(which is often different from the geometrical fractal 
dimension). In general, this leads to reduced diffusion 
when measured with Euclidean distances. Another 
possible caveat is the anisotropy of fault networks, 
resulting from the localization properties of mechan- 
ical systems which tend to organize oriented shear 
bands. For instance, there is a very strong South- 
East to North- West preferred orientation of the San 
Andreas fault network in California. 

While previous empirical analyses including the 
present one as well as the ETAS model of cascades 
of triggering have neglected geometry, our tests sug- 
gest that geometry is a crucial ingredient. By geom- 
etry, one should include both the effect of a possible 
localization of a fraction of aftershocks on mainshock 
rupture faults and their localization of pre-existing 
hierarchical fault networks, as well as the dependence 
of the range over which aftershocks are triggered by 
the mainshock on the mainshock magnitude. 

7. Conclusion 

We have analyzed 21 aftershock sequences of Cali- 
fornia and found that the diffusion of seismic activity 
is very weak, when compared with previous studies. 
For most sequences, the spatial distribution of after- 
shocks is mostly limited to the mainshock rupture 
area. The rate of aftershocks is very small at dis- 
tances larger than the rupture length, even at large 
times after the mainshock. 

In the introduction, we noted that aftershock dif- 
fusion is predicted by any model that assumes that 
the time to failure increases if the applied stress de- 
creases. Our conclusion that aftershock diffusion is 
weak at best or non-existent suggests that the phys- 
ical process controlling the time of failure depends 
weakly on the magnitude of the stress change in the 
regime where this stress change is sufficient to trigger 
new events. 

We have examined the hypothesis that aftershock 
diffusion may result from multiple triggering of sec- 
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ondary aftershocks. In principle, this mechanism of- 
fers clear quantitative predictions, if one controls the 
parameter regime of the model. One problem is that 
our theoretical and numerical studies of cascade pro- 
cesses indicate that most predictions are very sensi- 
tive to small changes in the parameters of the seismic 
activity which cannot be easily determined from seis- 
micity catalogs. This variability of the ETAS param- 
eters from sequence to sequence may thus rationalize 
the variability of the diffusion exponent from one se- 
quence to another one. In conclusion, the large uncer- 
tainty on the estimation of the Omori exponent p and 
the diffusion exponent H, resulting from different fac- 
tors (small number of events, small available time and 
space scale, background seismicity and fault geome- 
try, high fluctuations from one sequence to another 
one), does not allow us to conclude clearly the valid- 
ity or the irrelevance of the mechanism of triggering 
cascade embodied by the ETAS model in describing 
aftershock diffusion. 
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Appendix A: Summary of main results 
of the ETAS model concerning 
aftershock diffusion 

The ETAS (Epidemic- Type Aftershock Sequence) model 
was introduced in Ogata [1988] and in Kagan and Knopoff 
[1981; 1987] (in a slightly different form) based on mu- 
tually exciting point processes introduced by Hawkes 
[Hawkes, 1971; 1972; Hawkes and Adamopoulos, 1973]. 
Contrary to what its name may imply, it is not only a 
model of aftershocks but a general model of triggered seis- 
micity. 

This parsimonious physical model of multiple cascades 
of earthquake triggering avoids the division between fore- 
shocks, mainshocks and aftershocks because it uses the 
same laws to describe all earthquakes without distinction 
between foreshocks, mainshocks and aftershocks. It is 
found [Helmstetter and Sornette, 2002a,b,c; 2003a; Helm- 
stetter et ai, 2003] to account surprisingly well for (i) 
the increase of rate of foreshocks before mainshocks (ii) 
at large distances and (iii) up to decades before main- 
shocks, (iv) a change of the Gutenberg-Richter law from 
a concave to a convex shape for foreshocks, and (v) the 



migration of foreshocks toward mainshocks. The emerg- 
ing concept is that the cascade of secondary, tertiary and 
higher-level triggered events gives rise naturally to long- 
range and long-time interactions, without the need for any 
new physics. This emphasis on cascades of triggered seis- 
micity provides a general understanding of the space-time 
organization of seismicity and offers new improved meth- 
ods for earthquake prediction. 

In the ETAS model, a main event of magnitude m trig- 
gers its own primary aftershocks according to the following 
distribution in time and space 



,(r,t) dr dt = K 10° 



9 c e dt (i d" dr 



(t + cy+ e (r + d)!+M ' 



(16) 



where r is the spatial distance to the main event (con- 
sidered as a point process). The spatial regularization 
distance d accounts for the finite rupture size. The power 
law kernel in space with exponent /i quantifies the fact 
that the distribution of distances between pairs of events 
is well described by a power-law [Kagan and Jackson, 
1998]. In addition, the magnitude of these primary af- 
tershocks is assumed to be distributed according to the 
Gutenberg-Richter law with slope 6. The ETAS model 
assumes that each primary aftershock may trigger its own 
aftershocks (secondary events) according to the same law, 
the secondary aftershocks themselves may trigger tertiary 
aftershocks and so on, creating a cascade process. The 
exponent 1 + 9 is not the observable Omori exponent p 
but defines the local (or direct) Omori law. The whole 
series of aftershocks, integrated over the whole space, can 
be shown to lead to a "renormalized" (or dressed) Omori 
law, which is the total observable Omori law [Helmstetter 
and Sornette, 2002a]. This global law is different from 
the direct Omori law l/(t + c) 1+e in (16): the observable 
(dressed) Omori exponent crosses over smoothly from the 
value 1 — 9 below a characteristic time t* to 1 + 9 at times 
much larger than t* 



t — c 



v r(i 



1/9 



(17) 



where the branching ratio v, defined as the average num- 
ber of aftershocks per earthquakes, is a function of the 
parameters of the ETAS model v — K b/(b — a). The 
renormalization of the direct Omori law with exponent 
1 + 9 into the observable dressed Omori law with expo- 
nent 1 — 9 (for t < t*) results from the cascade process. 
Intuitively, it is clear that the existence of cascades of sec- 
ondary aftershocks may lead to observable diffusion, anal- 
ogously to random walks whose succession of jumps create 
diffusion upon averaging. This analogy was established in 
[Helmstetter and Sornette, 2002b] which predicted differ- 
ent diffusion regimes according to the values of the model 
parameters. The simplest mathematical characterization 
of diffusion is through the evolution of the characteristic 
size ii of the aftershock cloud as a function of time t since 
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the main shock: 

R~t H , (18) 

where H is the diffusion exponent (equal to 1/2 for classi- 
cal diffusion). The theory and numerical simulations de- 
veloped in [Helmstetter and Sornette, 2002b] predict, for 
t < t* and 6 < 1 : 

H = 6/2 for n>2 (19) 
H = 6/fj, for fj, < 2 . 

where 

and a is proportional to the spatial regularization distance 
d in (16) up to a numerical constant function of fj, [Helm- 
stetter and Sornette, 2002b]. In all cases, the diffusion 
saturates progressively as t becomes much larger than t* . 
Here, the important message is that, despite the fact that 
time and space are uncoupled in the direct triggering pro- 
cess (16), the succession of cascades of events can lead 
to a macroscopic coupling of time and space, that is, to 
diffusion. This is illustrated in Figure 1 which presents 
results from numerical simulations of the ETAS model to 
show how cascades of multiple triggering can produce af- 
tershock diffusion. Since the condition t <t* also ensures 
that the observable Omori exponent 1 — 6 is different from 
the direct exponent 1 + 8, this triggering cascade theory 
predicts that diffusion should be observed most clearly 
when Omori's exponent p is smaller than 1. Note that, 
for p close to 1 as often observed empirically, that is for 9 
small, the predicted diffusion exponent H is significantly 
smaller than 1/2. The results in [Helmstetter and Sor- 
nette, 2002b] were obtained for a one dimensional process, 
but most results, in particular the diffusion law (18), are 
valid in any dimension. Another complication elaborated 
upon below is the fractal geometry of fault networks. 

A possible caveat in the predictions of the ETAS model 
given in [Helmstetter and Sornette, 2002b] is that we pre- 
dict only the average behavior of the space-time distri- 
bution of seismicity. In the regime where a/b > 0.5 rel- 
evant to real seismicity [Helmstetter, 2003], we observe 
huge fluctuations of the seismicity rate around the aver- 
age that may weaken the usefulness of predictions based 
on ensemble averages. Having said that, we have veri- 
fied by intensive numerical simulations that the prediction 
H > (1— p)/2 remains valid in all cases including the large 
deviation regime a > 6/2. Thus, for the problem treated 
here of detecting a possible aftershock diffusion, this issue 
is not a limiting point. 

Our tests on real aftershock sequences are interpreted 
in particular in the light shed by the ETAS model. We 
stress that the analysis of real data is much more diffi- 
cult than the study of synthetic sequences, due to the 
smaller number of earthquakes available, the presence of 
background activity, the effect of geometry and problems 



of catalog completeness especially just after large main- 
shocks. In addition, real seismicity is probably more com- 
plicated than assumed by the ETAS model. These limi- 
tations imply that it is difficult to obtain reliable quan- 
titative results on the diffusion exponent. However, a 
few qualitative predictions of the ETAS model should be 
testable in real data: 

• only sequences in the "early" time regime t < t* 
characterized by an Omori exponent p < 1 should 
diffuse; 

• the diffusion of seismic activity should be related to 
a decrease of the Omori exponent p as the distance 
r from the mainshock increases; 

• the characteristic size of the cluster is expected to 
grow according to expression (18) with the diffusion 
exponent H positively correlated with the #-value. 

Appendix B: Implementation of the 
wavelet method 

Method using the 1/iJ-scaling law normalized 
curves 

The determination of the exponents p and H is per- 
formed by using the following steps when using the 1/H- 
scaling law normalized curves: 

1. Compute C a (R) as a function of scale a for a series 
of given 7? values. Typically, we considerer a values 
varying with a multiplicative factor of 1.1, while R 
varies with a multiplicative factor of 1.01. This last 
value will be justified below. For each R value we 
thus obtain a curve showing the variation of C a (R) 
with a. Those curves are called 7i-curves. 

2. Select the range in scale a over which all 7?-curves 
display a power-law behavior as a function of a. 
Note that the exponent of the power law may vary 
from one R to another R (which is the hallmark of 
an underlying diffusion process). We could also se- 
lect a different range of a for each value of R, but 
this would drastically complicate the data process- 
ing. 

3. For each 7?-curve, fit C a (R) over the selected range 
in a by a power law. This step proves necessary as 
some data sets can sometimes display strong fluc- 
tuations which will ultimately alter the results (this 
has been checked on synthetic data sets). Each in- 
terval in a for each it-curve is now replaced by its 
power-law fit approximation. 

4. Choose a trial (p, H) and normalize each curve ac- 
cording to these exponents and their respective a 
and R values according to (12) (1 ///-scaling law). 
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The normalized time scale axis is then re-sampled 
using a geometrical sampling with a multiplicative 
factor of 1.1, so that all normalized R-curves are 
defined at common abscissae. 

5. For each normalized value of the time scale, search 
for all normalized power law approximations of R- 
curves which are defined at that value. Let us as- 
sume there are N s such segments, each one corre- 
sponding to a different _R-curve (thus R value) . 

6. Compute the average value of these normalized R- 
curves that are defined at the same normalized time 
scale, as well as the associated variance. If none 
or only one power law approximation of .R-curves is 
defined for the considered normalized time scale, the 
calculation is not performed as the variance cannot 
be estimated. Then, go to the next normalized time 
scale. 

7. When all values of the normalized time scale have 
been considered, compute the average of all the vari- 
ances that have been defined up to now. This aver- 
age variance gives us an estimation of how the var- 
ious normalized segments collapse on top of each 
other in the goal of defining a single master curve. 
It is approximately equivalent to the square of the 
average width defined by superimposing all the nor- 
malized curves. 

8. This algorithm is implemented using a (p, H) grid, 
with p varying from to 2 by steps of 0.01, while 
H varies within [—1 : 1], with the same step. When 
H = 0, no computation is made. A systematic 
search provides the couple of exponents (p, H) with 
the lowest average variance, i.e., the best collapse of 
the wavelet coefficients as a function of time scale 
and distance. For H — 0, the value of the variance 
for any couple (p, 0) is estimated as the mean of 
the variances obtained for (p, — 0.01) and (p, 0.01). 
This estimation has no real statistical meaning but 
is useful for representation purpose. The top left 
panel of Figure 7 constructed for the Round Val- 
ley mainshock show the contour lines representing 
equivalues of the average variance (quantified in log- 
arithmic scale in base 10). 

9. Once the best (p, H) couple has been found, the nor- 
malized wavelet coefficients C a (R) are calculated as 
a function of a using the original non-normalized 
real curves (and not their power-law fit approxima- 
tions) . This is performed for the purpose of visualiz- 
ing the predicted collapse of the wavelet coefficients 
on the real data-set, as shown in the left lower panel 
of Figure 7 constructed for the Round Valley main- 
shock. 

Note that this collapse method applied to the 1/H- 
scaling law can not work for H = strictly, since 1/H 



diverges. To address this minor technical problem, we 
choose a very small logarithmic step for the R values to 
allow us to consider H values as small as ±0.01. Indeed, 
the smaller is \H\, the more dilated is the normalized time 
scale axis. If H is too small, the scaling regions of two suc- 
cessive wavelet coefficients for two successive R values will 
undergo so much offset that they will not be defined on 
any common normalized time scale, preventing an estima- 
tion of an average variance. Considering small values of H 
thus necessitates the computation of wavelet coefficients 
for successive R with a very small step in R (hence the 
choice of a multiplicative factor of 1.01). 

Method using the iJ-scaling law normalized 
curves 

The determination of the exponents p and H is per- 
formed by using the following steps when using the H- 
scaling law normalized curves: 

1. We now re-organize all the C a (R) values by plotting 
curves of C a (R) as a function of R for various values 
of a. These are now the a-curves. 

2. The a-curves do not display any peculiar behavior 
with R. They are monotonically increasing with R, 
and saturate at large R values. Among those a- 
curves, there is always a subset of curves which are 
nearly parallel (or at least don't cross each other), 
which is a behavior predicted by diffusive processes. 
If all the curves are strictly parallel, one can con- 
clude there is no underlying diffusion at all. Other 
curves either cross this subset, or are simply too 
noisy and are thus eliminated. This allows us to 
select the a-curves we will use to invert for p and 
H using the .ff-scaling law. Note that we will con- 
serve "raw" a-curves, which can't be approximated 
by any power-law. 

3. Choose a trial (p, H) and normalize each curve ac- 
cording to these exponents and their respective a 
and R values according to (11) (H-scaling law). The 
normalized space scale axis is then re-sampled using 
a geometrical sampling with a multiplicative factor 
of 1.01, so that all normalized a-curves are defined 
at common abscissae. 

4. For each normalized value of the space scale, search 
for all normalized a-curves which are defined at that 
value. Let us assume there are N s such segments, 
each one corresponding to a different a-curve (thus 
a value). 

5. Compute the average value of these normalized a- 
curves that are defined at the same normalized time 
scale, as well as the associated variance. If none or 
only one a-curve is defined for the considered nor- 
malized space scale, the calculation is not performed 
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as the variance cannot be estimated. Then, go to 
the next normalized time scale. 

6. When all values of the normalized space scale have 
been considered, compute the average of all the vari- 
ances that have been defined up to now. This av- 
erage variance gives us an estimation of how the 
various normalized curves collapse on top of each 
other in the goal of defining a single master curve. 
It is approximately equivalent to the square of the 
average width defined by superimposing all the nor- 
malized curves. 

7. This algorithm is implemented using a (p, H) grid, 
with p varying from to 2 by steps of 0.01, while 
H varies within [—1 : 1], with the same step. When 
H = 0, no computation is made. A systematic 
search provides the couple of exponents (p, H) with 
the lowest average variance, i.e., the best collapse of 
the wavelet coefficients as a function of time scale 
and distance. In this case, there is no problem for 
H = 0. The top right panel of Figure 7 constructed 
for the Round Valley mainshock show the contour 
lines representing equivalues of the average variance 
(quantified in logarithmic scale in base 10). 

8. Once the best (p, H) couple has been found, the nor- 
malized wavelet coefficients C a (R) are calculated as 
a function of R using the original non-normalized 
a-curves. This is performed for the purpose of vi- 
sualizing the predicted collapse of the wavelet coef- 
ficients on the real data-set, as shown in the right 
lower panel of Figure 7 constructed for the Round 
Valley mainshock. 
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Table 1. Analysis of aftershock sequences of California with the windowing method. The first and second 
columns give the name and time of the mainshock. M is the mainshock magnitude, T and R are the temporal 
and spatial windows used to select aftershocks, M is the minimum magnitude of aftershocks, p is the Omori 
exponent measured over t min < t < T, N is the number of aftershocks. H r , H a and Hi, are the diffusion exponents 
measured using the average distance between aftershocks and the barycentcr, the large elliptical axis a, and the 
short elliptical axis b respectively. 
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Table 2. Analysis of aftershock sequences of California with the wavelet method. The first 
column gives the name of the mainshocks. p (H method) is the Omori's exponent obtained 
with the //-scaling law. The quantities a and R are discussed in the text, p (1/if method) 
is the Omori's exponent obtained with the 1/iJ-scaling law. H (H method) is the diffusion 
exponent obtained with the //-scaling law. H (1/H method) is the diffusion exponent obtained 
with the 1/iJ-scaling law. 
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a The H and p values cannot be estimated with the //-wavelet method for the San-Fernando sequence because there is 
no clear minimum of the variance. 
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Figure 1. Maps of seismicity generated by the ETAS model with parameters b = 1, 8 = 0.2, /x = 1, d = 1 
km, a = 0.5, c = 0.001 day and a branching ratio n = 1. The mainshock occurs at the origin of space with 
magnitude M = 7 (black star). The minimum magnitude is fixed at tuq = 0. The distances between mainshock 
and aftershocks follow a power-law with parameter fi = 1 and the local Omori law is oc l/t 1+e . According to the 
theory developed in the text, the average distance between the first mainshock and the aftershocks is thus expected 



to grow as R ~ t ^ 



j.0.2 



The two plots are for different time periods of the same numerical simulation, such 



that the same number of earthquakes N = 3000 is obtained for each graph: (a) time between and 0.3 days; (b) 
time between 30 and 70 yrs. At early times, aftershocks are localized close to the mainshock, and then diffuse and 
cluster around the largest aftershocks. 
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Figure 2. Analysis of a synthetic aftershock sequence. We have built a synthetic catalog by superposing a large 
number of independent aftershock sequences. We have used 1000 mainshocks, with a Poissonian distribution in 
time and space. Each mainshock generates only direct aftershocks with a rate given by (16), with K = 10, a = 0.8, 
9 = —0.1, c = 0.001 day, [i — 2 and d equal to the rupture length of the mainshock. The distribution of distances 
between aftershocks and mainshocks is thus independent of the time between mainshock and aftershocks. The 
global number of events in the catalog is 40000 including the 1000 mainshocks. The seismicity rate (a) displays 
several peaks corresponding to the occurrence of large mainshocks, as observed for California seismicity. The map 
(b) shows large clusters corresponding to the aftershock sequences of the largest mainshocks. The average distance 
between all pairs of events (c) shown as circles increases with the average time between events as R ~ t H with 
H = 0.15 (solid line), for a large interval of the time between events [0.001 — 10] days. For larger times, the average 
distance saturates to R « 80 km, half the size of the catalog. The diamonds show the results obtained with the 
method of Marsan et al. [2000]. At early times (t < 1 day), the average distance is constant and the method 
is effective to remove the influence of uncorrelated seismicity. But at large times when the aftershock activity is 
small, there are large fluctuations of the average distance, because the method is very sensitive to the noise. 
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Figure 3. Analysis of the June, 28, 1992, M — 7.3 Landers aftershock sequence, (a) Map of aftershocks, the 
mainshock epicenter is shown by a star, (b) Magnitude distribution for different time windows (time increasing 
from black to gray) showing that the magnitude distribution is stable over time, and that the catalog is complete 
above m — 2.2 after 3 days after the mainshock. (c) Rate of seismic activity as a function of the distance from 
the mainshock for different times after the mainshock (increasing time from top to bottom (black to gray)). The 
background activity preceding the mainshock is shown as a dashed line, (d) Rate of aftershocks for the whole 
sequence (solid black line at the top) and fit by an Omori law (dashed gray line), and rate of aftershocks for 
different distances from the mainshock (increasing distance from gray to black), (e) Characteristic size of the 
aftershock cluster as a function of the time from the mainshock, measured by the average distance from the 
barycenter (circles), or from the small ('+') and large ('x') inertial axes, (f) Variation of the Omori exponent with 
the distance from the mainshock. 



25 




Figure 4. Map of the aftershocks of the Landers earthquake, for different time windows with 1000 events in each 
plot, showing the stationarity of the spatial distribution of aftershocks. The epicenter is shown by a cross. The 
gray and black lines show the large and small elliptical axes respectively (multiplied by a factor 2 for clarity). 
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Figure 5. Diffusion exponents H as a function of the Omori exponent p for the aftershock sequences described in 
Table 1. The circles give the diffusion exponent H r evaluated with the mean distance from the barycenter (radius 
of gyration), the crosses correspond to the diffusion exponent H a and the '+' correspond to Hb- 




Figure 6. Wavelet kernel used for the wavelet analysis of aftershock diffusion, defined by (13). This wavelet kernel 
and its derivative both vanish at time t = and has a zero mean over the interval [0, +oo[. 
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Figure 7. Determination of the exponents H and p for the Round Valley mainshock using the l/_ff-scaling 
method (upper and lower left panels) and the H-scaling method (upper and lower left panels). The upper panels 
represent the contour plots in log 10 scale of equi- values of the average variance of the matching of wavelet coefficients 
(calculated as a function of time scale for different distances R; see text for details) as a function of trial values of 
p and H. The minimum determines our best estimation for H and p for this sequence. We get two estimates, one 
using the i7-scaling method (p = 0.35, H — 0.32) and another using the 1/ii-scaling method (p = 0.72, H = 0.24). 
Note that the minimum found by the 1/ii-scaling method is better defined than when using the H-scaling method. 
The lower panels show the resulting quality of the collapsed wavelet coefficients as a function of time scale for 
different distances R, using these best estimates of p and H. 
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Figure 8. Upper panel: correlation between the Omori exponent p obtained with the 1/H method (horizontal 
axis) and with the H method (vertical axis). The line of slope 1 is drawn as a reference. Lower panel: correlation 
between the exponents H obtained with the 1/H method (horizontal axis) and with the H method (vertical axis). 
The line of slope 1 is drawn as a reference. 
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Figure 9. Analysis of the Morgan-Hill aftershock sequence. Same legend as in Figure 3. 




Figure 10. Map of the Morgan-Hill aftershock sequence. Same legend as in Figure 4. 
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Figure 11. Analysis of the Imperial Valley aftershock sequence. Same legend as in Figure 3. 
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Figure 12. Same as Figure 4 for the Imperial Valley aftershock sequence. Note that the epicenter shown as a 
cross is far off from the locations where aftershocks cluster. This justifies our use of the aftershock barycenter as a 
more natural reference point for measuring diffusion. 
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Figure 13. Diffusion exponent H as a function of the Omori law exponent p obtained with the l/H method for 
all the aftershock sequences described in table 2. The thick lines are the approximative predictions of the ETAS 
model for fi > 2 (20), assuming that p = 1 — 9 if p < 1 and p = 1 + 6 if p > 1. 
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Figure 14. Comparison of the two methods of estimation of p and H . (a) Omori exponent measured by the 
wavelet 1/H method (Table 2) versus the Omori exponent estimated by the windowing method (Table 1). (b) 
Diffusion exponent measured by the wavelet 1/H method (Table 2) versus the diffusion exponent H r estimated by 
the windowing method (Table 1). 
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Figure 15. Average distance R(t) between a mainshock of magnitude M — 6 and its aftershocks for numerical 
simulations of the ETAS model with n = 1, a = 0.5, b = 1, c = 0.001, 8 = 0.2, too = and \i = 1, obtained 
by averaging over 1000 simulations. The circles show the results for d — 10 km, independently of the mainshock 
size, and without any constrain on aftershock selection. The diffusion exponent H = 0.17 is close to the prediction 
H = 6/ fj, = 0.2. The crosses correspond to another simulations of the ETAS model with the same parameters 
except that the characteristic distance d now depends on the magnitude of each event according to d = 0.01°' 5M . 
For this simulation, we have also selected aftershocks up to a distance of 100 km (10 times the mainshock rupture 
length) and we have rejected all aftershock sequences containing at least an event larger than the mainshock. All 
these factors weaken the diffusion by comparison the the prediction of the ETAS model. 



